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SUMMARY 

Sequences encoding the surface projection glycoprotein of the coronavirus, murine 
hepatitis virus (MHV), strain JHM, have been cloned into pAT153 using cDNA 
produced by priming with specific oligonucleotides on infected cell RNA. The regions 
of three clones pJMSIOlO, pJSl 12 and pJS92, which together encompass the surface 
protein gene have been sequenced by the chain termination method. The sequence of 
the primary translation product, deduced from the DNA sequence, predicts a 
polypeptide of 1235 amino acids with a molecular weight of 136600. This polypeptide 
displays the features characteristic of a group 1 membrane protein; an amino-terminal 
signal sequence and carboxy-terminal membrane and cytoplasmic domains. There are 
21 potential glycosylation sites in the polypeptide and a cysteine-rich region in the 
vicinity of the transmembrane domain. During maturation proteolytic processing of 
the polypeptide occurs and at positions 624 to 628 the sequence Arg-Arg-Ala-Arg- 
Arg is found, which is similar to a number of basic sequences involved in the cleavage 
of enveloped RNA virus glycoproteins. The fusogenic properties of the MHV surface 
protein do not appear to correlate with a strongly hydrophobic region at the putative 
amino terminus of the carboxy-terminal cleavage product. 

INTRODUCTION 

Coronaviruses are pleomorphic, enveloped viruses which replicate in the cytoplasm of 
vertebrate cells and are associated with diseases of economic importance (Siddell et al. , 1983a). 
In the laboratory, the murine hepatitis virus (MHV) has been extensively used for the study of 
viral pathogenesis, in particular as a component of a model for demyelinating diseases of man 
(Knobler et al ., 1982; Watanabe et al ., 1983; Massa et al ., 1986). The MHV genome is a 
monopartite, positive-stranded RNA of approximately 18 kb. The genome encodes the 
nucleocapsid (N), membrane (M or El) and surface (S or E2) proteins of the virion, as well as 
several non-structural proteins (Sturman & Holmes, 1983; Siddell et al ., 1983b). 

The MHV S protein is synthesized on membrane-bound ribosomes as a co-translationally N- 
glycosylated polypeptide with an apparent mol. wt. of 150000 (Niemann et al ., 1982; Holmes et 
al ., 1981; Siddell et al. , 1981). The polypeptides synthesized in vitro or in tunicamycin-treated 
cells have mol. wt. of approximately 120000 (Rottier et al ., 1981; Siddell, 1983). During 
transport within the cell, oligosaccharides are trimmed and terminal sugars are added, resulting 
in a 180000 mol. wt. S polypeptide. Shortly before, or at the time of, virus release a proportion of 
S is cleaved into two approximately 90000 mol. wt. polypeptides, Sj and S 2 (Niemann et al ., 
1982; Sturman et al., 1985). S, and S 2 (which are also referred to as 90B and 90 A; Sturman et al., 
1985) cannot be distinguished by SDS-PAGE but can be separated by hydroxyapatite 
chromatography. It has been shown that S 2 is acylated (Ricard & Sturman, 1985). The cleavage 
of the S polypeptide is a host cell-dependent event (Frana et al ., 1985) and activates its cell-fusing 
ability. The S protein is also responsible for the attachment/infectivity of the MHV virion and 
some monoclonal hybridoma antibodies which react with the S protein are able to mediate virus 
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neutralization in vitro and passively protect mice against lethal virus challenge in vivo (Collins et 
al , 1982). 

The organization and expression of the MHV genome has been studied in detail (for reviews, 
see Holmes, 1985; Siddell, 1986) (Fig. 1). Briefly, in MHV-infected cells six subgenomic 
mRNAs, as well as genome-sized RNA, are produced. These mRNAs form a 3' co-terminal 
nested set and each also has a common 5' leader sequence of about 70 bases (Lai et al., 1984). The 
available evidence suggests that only the information contained within the ‘unique’ sequences at 
the 5' end of each mRNA (i.e. those absent from the next smallest RNA) is translated into 
protein (Siddell, 1986). The translation of size-fractionated MHV mRNAs has shown that 
subgenomic mRNA 3 encodes the S protein (Siddell, 1983). 

Previously, we have isolated cDNA clones containing overlapping viral inserts which 
encompass approximately 4-6 kb at the 3' end of the MHV-JHM genome (Skinner & Siddell, 
1983, 1985; Skinner et al., 1985; Pfleiderer et al., 1986). Using specific oligonucleotide primers 
we have now isolated two further clones which contain inserts extending to the 5' end of mRNA 
3. The regions of the three clones which together contain the MHV S gene (Fig. 1) have been 
completely sequenced on both strands. This sequence, together with the predicted amino acid 
sequence of the S gene product is presented in this paper. 

METHODS 

cDNA cloning. The isolation and characterizaton of the plasmid pJMSIOlO has been previously described 
(Skinner et al., 1985). The growth of Sac( —) cells, the propagation of MHV-JHM stocks and the isolation of 
polyadenylated RNA from MHV-JHM-infected cells have also been described (Siddell et al., 1980). The 
oligonucleotide primers A (3' GTCGACGACCACACGG 5'), B (3 / GTGTGGGACATTCGGAT 5') and others 
were synthesized using the phosphoramidite method on an Applied Biosystems 380 A DNA Synthesizer. cDNA 
synthesis was carried out using the method of Gubler & Hoffman (1983) with slight modifications. In particular, 
prior to trailing the double-stranded cDNA with dC residues, potential RNA overhangs were removed by 
treatment with DNase-free RNase A (10 pg/ml) for 8 min at 37 °C. The tailed ds cDNA was cloned into dG-tailed 
Psff-cleaved pAT153. This material was used to transform Escherichia coli DH1 and selection was made for 
tetracycline resistance. Clones containing viral inserts were identified by colony hybridization using 
polynucleotide kinase 32 P-labelled, cDNA synthesis primer as probe. The size of viral inserts in plasmids from 
hybridizing clones was determined by gel electrophoresis of Rstl-cleaved DNA. An oligonucleotide (3' 
GCATGCCTGCGGTTAG 5'), which corresponds to a region near the 5' end of the MHV-JHM mRNA leader 
(Skinner & Siddell, 1983) was used in hybridizations to identify the plasmid pJS92. 

Subcloning in MIS. Fragments of the viral inserts contained within pJMSIOlO, pJSl 12 and pJS92 were 
generated by a variety of restriction enzymes and were cloned either as mixtures or as single fragments (purified by 
electroelution from either agarose or acrylamide gel) into the M13 vectors mp8, mp9, mpl8 and mpl9. Where 
necessary, specific clones were identified by hybridization to single-stranded DNA probes generated from 
characterized Ml3 clones (O’Hare et al., 1983). 

DNA sequencing. M13 dideoxynucleotide sequencing was carried out using [a- 35 S]dATP. The complete 
sequence was obtained on both strands. To complete the project oligonucleotides complementary to specific MHV 
sequences were synthesized and used to prime the sequencing reactions. Sequence data were analysed and 
assembled using the programs of Staden (1982a). 

Southern/Northern analysis. Northern blot analysis of RNA following electrophoresis in 1% agarose- 
formaldehyde gels, Southern blot analyses of DNA and nick translations were performed according to Maniatis et 
al. (1982). 


RESULTS 

The position of the MHV sequences contained within the plasmid pJMSIOlO has been 
previously determined by Northern blot and sequence analysis (Skinner et al., 1985). The viral 
insert within the plasmid extends from within the M protein gene (which is translated from 
mRNA 6) to a position approximately 2-6 kb from the 5' end of mRNA 3 (Fig. 1). A 16-base 
oligonucleotide, primer A, complementary to a sequence towards the 5' end of the pJMSIOlO 
insert was used to prime cDNA synthesis from infected cell poly(A)-containing RNA. Plasmid 
pJSl 12 obtained from this experiment contained a 2-2 kb insert which hybridized in Northern 
blots to the MHV mRNAs 3, 2 and 1 (data not shown). Sequence analysis confirmed that the 3' 
end of the insert corresponded to the cDNA synthesis primer. In a second cDNA synthesis 
experiment a 17-base oligonucleotide, primer B, complementary to a sequence towards the 5' 
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Fig. 1. Genomic organization of murine hepatitis virus. The relationship between the 3' co-terminal 
nested set of mRN As and the viral genome is shown, together with the coding regions for the structural 
proteins, nucleocapsid (N), membrane (M) and surface (S), specified by mRN As 7,6 and 3 respectively. 
The arrangement of the cDNA clones and the positions of the primers used are shown. The sequences 
presented in Fig. 2 are represented by the hatched box. 


end of the pJSl 12 insert was used to obtain the plasmid pJS92. pJS92 contained an insert of 530 
bp and hybridized in Northern blot analysis to all viral mRNAs. Hybridization of the pJS92 
insert to the cDNA synthesis primer and to a primer corresponding to the MHV-JHM leader 
sequences (see Methods) was confirmed by Southern blot analysis of Rs/I-cleaved pJS92 DNA 
(data not shown). 

A 3780-base sequence containing the gene encoding the MHV-JHM S propolypeptide (i.e. the 
predicted primary translation product) is presented in Fig. 2. Immediately preceding the AUG 
initiation codon is the sequence UCUAAAC. This sequence is identical to genomic sequences 
preceding the known or presumed 5' initiation codons of mRNAs 7, 5 and 4 and differs by only 
one base from the sequence UCCAAAC, preceding the initiation codon of mRNA 6 (Skinner et 
al. , 1985; Skinner & Siddell, 1985; Pfleiderer et aL , 1986). It is thought that these sequences, 
referred to as regions of homology, are involved in regulating the synthesis of MHV mRNAs 
(Armstrong et al ., 1984; Spaan et al 1983). 

The AUG codon at position 31 initiates an open reading frame (ORF) of 3705 bases encoding 
a polypeptide of 1235 amino acids with a predicted mol. wt. of 136600. This ORF ends with a 
single UGA termination codon. The sequence context of the initiating codon, AAACAUGC , is 
frequently found amongst functional eukaryotic initiator sequences (Kozak, 1983). 

A number of structural features of the S propolypeptide are noteworthy. Firstly, within the 
MHV-S propolypeptide sequence there are 21 potential A-glycosylation sites of the type Asn-X- 
Thr/Ser (assuming that X is not Pro) (Fig. 2). The distribution of these sites is also shown in Fig. 
3. It is clear that at least one cluster of potential glycosylation sites occurs in the carboxy- 
terminal region of the polypeptide, between amino acids 1092 and 1158. Secondly, a 
hydropathicity plot of the amino acid sequence of the S propolypeptide, determined using the 
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cttgtagtttaaatctaatctaatctaaacatgctgttcgtctttattttactattaccctcttgtttagggtatattggtgattttaga 

MetLeuPheValPheIlaLeuL@uLeuProS«rCysL«uGlyTyrll®GlyAspPheArg 

MLFVFILLLPSCLGYIGDFR 


tgtatccagaccgtgaattataacggcaataatgcttctgcgcctagcattagcaccgaagcagtcgatgtttccaaaggtcggggcact 

CysIleGInThrValAsnTyrAsnGlyAsnAsnAlaSerAlaProSerlleSerThrGluAlaValAspValSerLysGlyArgGlyThr 

CIQTVNYNGNNASAPSISTEAVDVSKGRGT 

T ACTATGTTTTAGATCGTGTTTACTTAAATGCCACGTTATTGCT TACTGGTTAT TATC CTGTGGACGGTTC CAATTATCGGAATCTCG CG 
Ty rTyrValLeuAspArgValTyrLeuAsnAlaThrLeuLeuLeuThrGlyTyrTyrProValAspGlySerAsnTyrArgAsnLeuAla 
yyvldrvyln^tllltgyypvdgsnyrnla 

CTTACAGGCACTAATACCTTAAGCCTTACGTGGTTTAAACCACCCTTTCTAAGTGAGTTTAATGATGGTATATTTGCTAAGGTCCAGAAC 
LeuThrGlyThrAsnThrLeuSerLeuThrTrpPheLysProProPheLeuSe rGluPheAsnAspGlyllePheAlaLysValGlnAsn 

ltgtntlsltwfkppflsefndgifakvqn 

CTCAAGACAAATACGCCAACAGGTGCAACCTCATATTTTCCCACTATAGTTATAGGTAGTTTGTTTGGTAACACTTCCTATACCGTAGTT 
LeuLysThrAsnThrProThrGlyAlaThrSe rTyrPheProThrlleVallleGlySerLeuPheGlyAsnThrSerTyrThrValVal 
LKTNTPTGATSYFPTIVIGSLFGNTSYTVV 

TTAGAGCCATATAATAATATTATAATGGCTTCTGTTTGTACATATACCATTTGTCAATTACCTTACACACCCTGTAAGCCTAATACCAAT 
LeuGluProTyrAsnAsnllelleMetA 1 aSerValCysThrTy rThrIleCysGlnLeuProTyrThrProCys Lys ProAsnThrAsn 
LEPYNNI IMASVCTYTICQLPYTPCKPNTN 

GGTAATCGTGTTATTGGATTTTGGCACACAGATGTCAAACCGCCGATTTGTCTTTTAAAGCGTAATTTTACGTTTAATGTTAATGCCCCT 
GlyAsnArgVallleGlyPheTrpHisThrAspValLysProProIleCysLeuLeuLysArgAsnPheThrPheAsnValAsnAlaPro 
GNRVIGFWHTDVKPPICLLKRNFTFNVNAP 

TGGCTTTATTTCCATTTTTATCAGCAGGGTGGTACTTTTTATGCGTACTATGCGGATAAACCTTCCGCTACTACGTTTTTGTTTAGTGTG 
TrpLeuTyrPheHisPheTyrGlnGlnGlyGlyThrPheTyrAlaTyrTyrAlaAspLysProSerAlaThrThrPheLeuPheSerVal 
WLYFHFYQQGGTFYAYYADKPSATTFLFSV 

*•»«««•♦• 
TATATTGGCGACATTTTAACACAGTATTTTGTGTTACCTTTTATTTGTACTCCAACAGCTGGTAGCACTTTAGCTCCGCTCTATTGGGTT 
TyrlleGlyAsplleLeuThrGlnTyrPheValLeuProPhelleCysThrProThrAlaGlySerThrLeuAlaProLeuTyrTrpVal 
Y I GDI LTQYFVLPF I CTPTAGSTLAPL YWV 

ACACCTTTACTTAAGCGCCAATATTTGTTTAATTTTAATGAAAAGGGTGTCATTACTAGTGCTGTTGATTGCGCCAGCAGCTACATTAGT 
ThrProLeuLeuLysArgGlnTyrLeuPheAsnPheAsnGluLysGlyVallleThrSerAlaValAspCysAlaSerSerTyrlleSer 
tpllkrqylfnfnekgvitsavdcassyi s 

GAAATAAAATGTAAGACCCAAAGTCTCTTACGGAGTACTGGTGTCTATGATCTATCCGGTTACACGGTCCAACCTGTTGGAGTTGTGTAC 
GluIleLysCysLysThrGlnSerLeuLeuProSerThrGlyValTyrAspLeuSerGlyTyrThrValGlnProValGlyValValTyr 
E I kcktqsllpstgvydlsgytvqpvgvvy 

cggcgtgttcctaacctacctgattgtaaaatagaggaatggctcactgctaaatctgtgccgtcacctctcaattgggagcgtaggact 

ArgArgValProAsnLeuProAspCysLysIleGluGluTrpLeuThrAlaLysSerValProSerProLeuAsnTrpGluArgArgThr 

rrvpnlpdckieewltaksvpsplnwerrt 

TTCCAAAATTGTAATTTTAATTTAAGCAGCCTGCTACGTTATGTCCAGGCTGAGTCTTTGTCGTGTAATAATATTGATGCGTCCAAAGTG 
PheGlnAsnCysAsnPheAsnLeuSerSe rLeuLeuArgTyrValGlnAlaGluSerLeuSerCysAsnAsnlleAspAlaSerLysVal 

fqncnfnlssllryvqaeslscnnidaskv 

TAT GGTATGT GCTTTGGT AGTGT CT CAGT TGAT AAGT T TG CTAT C CCCCGAAGC CGTCAAAT TG AT T T ACAAATTGG CAACT CCGGATTT 
TyrGlyMetCysPheGlySerValSerValAspLysPheAlalleProArgSerArgGlnlleAspLeuGlnlleGlyAsnSerGlyPhe 
YGMCFGSVSVDKFAIPRSRQI DLQIGNSGF 

ttgcaaacggctaattataagattgataccgctgccacatcatgtcagctgtattacagtcttcctaagaataatgttaccataaataac 
L euGlnThrAlaAsnTyrLysIl&AspThrAlaAlaThrSerCysGlnLeuTyrTyrSerLeuProLysAsnAsnValThrlleAsnAsn 
lqtan. ykidtaatscqlyyslpknnvtinn 

tataacccctcgtcttggaataggaggtatggttttaaagtaaatgatcgctgccaaatttttgctaacatattgttaaatggcattaat 

TyrAsnProSerSe rTrpAsnArgArgTyrGlyPheLysValAsnAspArgCysGlnllePheAlaAsnlleLeuLeuAsnGlylleAsn 

ynpsswnrrygfkvndrcqifanillngin 

agtgggactacgtgttccacagatttacaattgcctaatactgaagtggccactggcgtttgcgtcagatatgacctctatggtattact 
S e rGlyThrThrCysSerThrAspLeuGlnLeuProAsnThrGluValAlaThrGlyValCysValArgTyrAspLeuTyrGlylleThr 
SGTTCSTDLQLPNTEVATGVCVRYDLYGIT 

ggtcaaggtgtttttaaagaggtcaaggctgactattataatagctggcaggccctattatatgatgttaatggtaacttaaacgggttc 

GlyGlnGlyValPheLysGluValLysAlaAspTyrTyrAsnSerTrpGlnAlaLeuLeuTyrAspValAsnGlyAsnLeuAsnGlyPhe 
gqgvfkevkadyynswqallydvngnlngf 

cgtgaccttaccactaacaagacttatacgataaggagctgttatagtggccgtgtttctgctgcatatcataaagaagcacccgaaccg 

ArgAspLeuThrThrAsnLysThrTyrThrIleArgSerCysTyrSerGlyArgValSe rAlaAlaTyrHisLysGluAlaProGluPro 

rdlttnktytirscysgrvsaayhkeapep 
*«•«•««■« 
gctctgctctatcgtaatataaattgtagttatgtttttactaataatatttcccgtgaggaaaacccccttaactattttgatagttat 
A laLouLeuTyrArgAsnlleAsnCysSerTyrValPhoThrAsnAsnlleSerArgGluGluAsnProLeuAsnTyrPheAspSerTyr 
ALLYRNINCSYVFTNNISREENPLNYFDSY 

• • 

• 4 « » * * ■ • m 

TTGGGTTGTGTTGTTAATGCTGATAACCGCACGGATGAGGCGCTTCCTAATTGCAATCTCCGTATGGGTGCTGGACTATGCGTAGATTAT 

L®uGlyCysValValAsnAlaA8p£.snArgThrAspGluAlaL«uPr0AsnCysAsnL«uArgM*tGlyAlaGlyL,*tiCYsValAspTyr 

LGCVVMADNRTDEALPNCNLRMGAGLCVDY 


TCAAAGTCACGCAGAGCCCGCCGATCAGTTTCTACTGGCTATCGATTAACCACATTCGAGCCATACATGCCGATGTTAGTCAATGATAGC 
SerLysSerArgArgAlaArgArgSerValSerThrGlyTyrArgLeuThrThrPhaGluProTy rMatProMetL«uValAsnAapS«r 
SKSRRARRSVSTGYRLTTFEPYMPMLVNDS 


GTTCAATCCGTAGGTGGATTATATGAGATGCAAATACCAACCAATTTTACTATTGGTCATCATGAGGAATTCATCCAGATAAGGGCTCCC 
ValGlnSerValGlyGlyLeuTyrGluMetGlnlleProThrAsnPheThrlleGlyKisHlsGluGluPhelleGlnlleArgAlaPro 
VQSVGGLYEMQIPTNFTIGHHEEFIQIRAP 
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AAGGTGACTATAGATTGTGCTGCATTTGTTTGTGGTGATAACGCTGCATGCAGACAGCAGTTGGTTGAGTATGGCTCTTTTTGTGATAAT 
LysValThrlleAspCysAlaAloPheValCysGlyAspAsnAlaAlaCysArgGlnGlnLeuValGluTyrGlySorPheCysAspAsn 
KVTI DCAAFVCGDNAACRQQLVEYGSFCDN 

GTTAATGCCATTCTTAATGAGGTTAATAACCTCTTGGATAATATGCAATTACAAGTTGCTAGTGCATTAATGCAGGGTGTTACTATAAGT 
ValAsnAlaIleLeuAsnGluValAsnAsnLeuLeuAspAsnM©tGInLeuGlnValAlaS©rAlaL«uMetGlnGlyValThrIloS«r 
V N A I LNEVNNLLDNMQLQVASALMQGVTIS 

TCGAGGCTGCCAGATGGCATCTCCGGCCCTATAGATGACATTAATTTCAGTCCTCTACTTGGATGCATAGGTTCAACATGTGCTGAAGAC 
SerArgLeuProAspGlylleSerGlyProIleAspAspIleAsnPheSerProLeuLeuGlyCyslleGlySerThrCysAlaGluAsp 
SRLPDGISGPIDDINFSPLLGCIGSTCAED 

. i , • ® . • « • • 

GGCAATGGACCTAGTGCGATACGGGGGCGTTCAGCTATAGAGGATTTATTATTTGACAAGGTCAAACTATCTGACGTTGGCTTTGTCGAG 

GlyAsnGlyProSerAlalleArgGlyArgSerAlalleGluAspLeuLeuPhaAspLysValLysLeuSerAspValGlyPheValGlu 

GNGPSAIRGRSAIEDLLFDKVKLSDVGFVE 

GCTTATAACAATTGCACTGGTGGTCAAGAAGTTCGCGACCTCCTTTGCGTACAGTCTTTTAATGGCATCAAAGTATTACCTCCCGTGTTG 
AlaTycAsnAsnCysThrGlyGlyGlnGluValArgAspLeuLeuCysValGlnSerPheAsnGlylleLyeValLeuProProValLeu 
AYNNCTGGQEVRDLLCVQSFNGI KVLPPVL 

. • « * « • 4 • 1 

TCTGAGAGTCAAATCTCTGGCTACACAGCGGGTGCTACTGCGGCAGCTATGTTCCCACCTTGGACTGCAGCTGCTGGTGTGCCATTCAGT 
SerGluSerGlnlleSerGlyTyrThrAlaGlyAlaThrAlaAlaAlaMetPhoProProTrpThrAlaAlaAlaGlyValProPheSer 
S ESQ I SGYTAGATAAAMFPPWTAAAGVPF S 

#*»••••• * 
TT AAATGTT C AATATAGG ATTAATGGTTTAGGTGT C ACTATG AATGTT CTTAGTG AG AACCAAAAGATGATTG CT AGTGCTTTT AAC AAC 
LeuAsnValGlnTyrArglleAsnGlyLeuGlyValThrMetAsnValLeuSerGluAsnGlnLysMetlleAlaSerAlaPheAsnAsn 
LNVQYRINGLGVTMNVLSENQKMIASAFNN 

GCGCTCGGTGCTATTCAGGAAGGGTTCGATGCAACCAATTCTGCTCTAGGTAAGATCCAGTCCGTTGTTAATGCAAACGCTGAAGCACTT 

AlaLeuGlyAlalleGlnGluGlyPheAspAlaThrAsnSerAlaLeuGlyLysIleGlnSerValValAsnAlaAsnAlaGluAlaLeu 

ALGAIQEGFDATNSALGKIQSVVNANAEAL 

AATAATTTATTAAACCAACTTTCTAATAGGTTTGGTGCTATTAGTGCTTCTTTACAAGAAATTCTAACGCGGCTTGACGCTGTAGAAGCA 

AsnAsnLeuLeuAsnGlnLeuSerAsnArgPheGlyAlalleSerAlaSerLeuGlnGluIleLeuThrArgLeuAspAlaValGluAla 

NNLLNQLSNRFGAISASLQEILTRLDAVEA 

• «»••••• • 
AAGGCCCAGATAGATCGTCTTATTAATGGCAGGTTAACTGCACTTAATGCGTATATATCCAAGCAACTCAGTGATAGTACGCTTATTAAA 
LysAlaGlnlleAspArgLeuIleAsnGlyArgLeuThrAlaLeuAsnAlaTyrlleSerLysGlnLeuSerAspSerThrLeuIleLys 
KAQIDRLINGRLTALNAYISKQLSDSTLIK 

TTTAGTGCTGCTCAGGCCATCGAAAAGGTCAATGAGTGCGTTAAGAGCCAAACTACGCGCATTAATTTCTGTGGCAATGGTAATCACATA 

Phe$erAlaAlaGlnAlalleGluLysValAsnGluCysValLysSerGlnThrThrArgIl«AsnPh«CysGlyAsnGlyAsnHisIl« 

FSAAQAIEKVNECVKSQTTRINFCGNGNHI 


TTATCACTTGTCCAGAATGCGCCTTATGGCTTATGTTTTATTCATTTCAGCTACGTGCCAACATCCTTTAAAACGGCAAATGTGAGTCCT 
LeuSerLeuValGlnAsnAlaProTyrGlyLeuCysPhelleHisPheSerTyrValProThrSerPheLysThrAlaAsnValSerPro 
LSLVQNAPYGLCFIHFSYVPTSFKTANVS P 

GGACTATGCATTTCTGGTGATAGAGGATTGGCACCTAAAGCTGGATATTTTGTTCAAGATAATGGAGAGTGGAAGTTCACAGGCAGTAAT 
GlyLeuCysIleSerGlyAspArgGlyLeuAlaProLysAlaGlyTyrPheValGlnAspAsnGlyGluTrpLysPheThrGlySerAsn 
GLCISGDRGLAPKAGYFVQDNGEWKFTGSN 


TATTACTACCCTGAACCCATTACAGATAAAAATAGTGTTGCCATGATCAGTTGCGCTGTGAATTACACAAAAGCGCCTGAAGTTTTCTTG 

TyrTyrTyrProGluProlleThrAspLysAsnSe rValAlaMetIleSerCysAlaValAsnTyrThrLysAlaProGluValPheLeu 
YYYPEPITDKNSVAMISCAVNYTKAPEVFL 

AACAACTCAATACCAAATCTACCCGACTTTAAGGAGGAGTTAGATAAATGGTTTAAGAATCAGACGTCTATTGCGCCTGATTTATCCCTC 
AsnAsnSerIleProAsnLeuProAspPheLysGluGluLeuAspLysTrpPheLysAsnGlnThrSerIleAlaProAspLeuSerLeu 
NNSIPNLPDFKEELDKWFKNQTSIAPDLSL 
® ..... 

G ATTT CGAG AAGTT AAATGTTACTTT C CTGG AC CTGACTTATGAG ATG AAC AGGATTC AGG ATG CAATTAAGAAGTT AAATGAG AG CTAC 
AspPheGluLysLeuAsnValThrPheLeuAspLeuThrTyrGluMetAsnArglleGlnAspAlalleLysLysLeuAsnGluSerTyr 
DFEKLNVTFLDLTYEMNR1QDAIKKLNESY 


ATCAACCTCAAGGAAGTTGGCACATATGAAATGTATGTGAAATGGCCTTGGTATGTTTGGTTGCTAATTGGTTTAGCTGGTGTAGCTGTT 
IleAsnLeuLysGluValGlyThrTyrGluMetTyrValLysTrpProTrpTyrValTrpLeuLeuIlaGlyLeuAlaGlyValAlaVal 
1 NLKEVGTY EMYVKWPWYVWLLIGLAGVAV 
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3601 TGTGTGTTATTATTCTTTATATGTTGCTGCACAGGTTGCGGCTCATGTTGTTTTAGAAAATGCGGAAGTTGTTGTGATGAGTATGGAGGA 3690 

CysValLeuLeuPhePhelleCysCysCysThrGlyCysGlySerCysCysPheArgLysCysGlySerCysCysAspGluTyrGlyGly 
CVLLFFICCCTGCGSCCFRKCGSCCDEYGG 

o . . o o o . o. o o. * * .o oo 

3691 CACCAGGACAGTATTGTGATACATAATATTTCAGCCCATGAGGATTGACTATCACAGCCTCTCCTGGAAAGACAGAAAATCTAAACAATT 3780 

HisGlnAspSerlleVallleHisAsnlleSerAlaHisGluAspEnd 
HQDSIVIHN^SAHED* 

Fig. 2. Nucleotide sequence of the MHV surface protein gene and the predicted amino acid sequence of 

the surface protein precursor. The amino-terminal signal sequence (.), the carboxy-terminal 

transmembrane domain (-), the charge cluster (**), the cysteine-rich region (OX potential 

glycosylation sites (•) and the putative proteolytic cleavage site (-) are indicated. 


procedure of Kyte & Doolittle (1982), reveals two regions of striking hydrophobicity (Fig. 3). At 
the amino terminus, the initiator methionine is followed by nine non-polar amino acids and this 
hydrophobic core precedes a number of small neutral residues (e.g. Ser-11 Gly-14) which are 
characteristically found at the signal peptidase recognition site (Fig. 2) (Von Heijne, 1984). At 






52 


I. SCHMIDT, M. SKINNER AND S. SIDDELL 



Fig. 3. (a). Hydropathicity analysis of the S propolypeptide according to the method of Kyte & 
Doolittle (1982). The vertical scale is the average hydropathicity (+3 to — 3) for a frame of seven amino 
acids. The midpoint line represents the average hydropathicity of the 20 amino acids. Hydrophobic 
sequences appear above the base line. The putative signal (S) and anchor (A) domains, as well as the 
potential glycosylation sites (#) are shown. ( b ) Charge analysis of the MHV S propolypeptide. The 
analysis sums charge values over a window of nine amino acids. The putative cleavage site is indicated 

(t)- 

the carboxy terminus of the propolypeptide a sequence of 34 predominantly hydrophobic and 
neutral amino acids is followed by two positively charged amino acids (Arg-1209, Lys-1210), 
which are situated 25 amino acids from the carboxy terminus. Moreover, these ‘charge cluster’ 
residues are flanked by an unusual distribution of cysteine residues, which constitute 50% of the 
residues between positions 1198 to 1215. Not only an unusual distribution, but also a specific 
sequence, Cys-Gly-Ser-Cys-Cys, is found on either side of the ‘charge cluster’ (Fig. 2). Finally, 
the distribution of charged amino acids in the S propolypeptide (Fig. 3) indicates a particularly 
striking domain of basic residues, Arg-Arg-Ala-Arg-Arg, at positions 624 to 628. 

DISCUSSION 

The DN A sequences presented in Fig. 2 encompass the entire ‘unique’ region of MHV-JHM 
mRN A 3. In common with the mRNAs for the virion proteins, N and M, the 5' unique region of 
mRN A 3 is used to encode a single polypeptide, the precursor to the surface protein. In each of the 
mRNAs encoding MHV virion structural proteins, translation is initiated at the first AUG 
codon within the mRN A and the expressed ORF does not overlap with any downstream ORF 
present in the 3' co-terminal sequences. In mRNAs 3, 6 and 7 the initiating codon is found in a 
preferred context and follows closely (l, 4 and 8 bases respectively) the ‘region of homology’ 
sequences which define the 5' ends of the mRN A bodies (Skinner & Siddell, 1983; Pfleiderer et 
aL , 1986). These data further support the model of a non-overlapping translation strategy for 
coronavirus gene expression, at least for the virion structural proteins (Siddell, 1986; Brown et 
aL , 1986). 

The predicted amino acid sequence of the S propolypeptide, derived from the DNA sequence, 
reveals several interesting features. The predicted mol. wt. of the S propolypeptide is 136600. 
This agrees approximately with the size of the polypeptide found in tunicamycin-treated cells or 
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in vitro translation (Siddell, 1983). If the average mol. wt. of mannose-rich viral glycoprotein 
carbohydrate side chains is assumed to be 2000 to 3000, it seems likely that a considerable 
number of the 21 potential glycosylation sites are utilized, in order to account for the 
approximately 40000 to 50000 apparent mol. wt. difference between the glycosylated and non- 
glycosylated S polypeptides (Siddell, 1982). 

The hydropathicity analysis indicates that the MHV S polypeptide belongs to the group 1 
membrane proteins (Garoff, 1985). These proteins are inserted across the endoplasmic 
reticulum membrane starting from their amino terminus using the same mechanism as secretory 
proteins. They therefore carry a typical amino-terminal hydrophobic signal sequence which is 
not present in the mature protein. A putative signal sequence is present in the predicted MHV S 
propolypeptide. The small neutral glycine residue at position 14 appears to be a possible signal 
peptidase cleavage site (Von Heijne, 1984). 

A second hydrophobic region at the carboxy terminus of the MHV S propolypeptide is 
characteristic of a transmembrane domain and is delineated from the hydrophilic cytoplasmic 
domain by the positively charged Arg/Lys residues at positions 1209/1210. Garoff and 
colleagues (Cutler & Garoff 1986; Cutler et al. , 1986) have tested the postulate that the 
hydrophobic transmembrane domain, together with the ‘charge cluster’ of the cytoplasmic 
domain make up a membrane binding region of group 1 proteins, which acts as a ‘stop transfer 
signal’ to arrest translocation and prevent secretion. Their results, however, indicate that the 
‘charge cluster’ is necessary only for stabilization of the protein-membrane interaction and the 
hydrophobic stretch alone is able to arrest translocation. Surrounding the charge cluster in the 
MHV S polypeptide are an unusual number of cysteine residues, which occur in a specific 
sequence context. It seems possible that these sequences are involved in the acylation of the S 2 
polypeptide, because acylation of the vesicular stomatitis virus G protein is believed to occur at 
cysteine residues in the vicinity of the hydrophobic transmembrane domain (Rose et al., 1984; 
McGee et al. , 1984). 

During virus maturation the MHV S polypeptide is cleaved by a host cell enzyme to yield the 
S! and S 2 polypeptides. The charge analysis of the MHV S polypeptide reveals a sequence Arg- 
Arg-Ala-Arg-Arg at positions 624 to 628 which is very similar to a number of basic sequences 
involved in the cleavage of several other enveloped, RNA virus glycoproteins (White et al., 
1983). Sturman & Holmes (1977) have shown that the MHV-A59 S polypeptide can be cleaved 
by trypsin and it appears likely that coronaviruses, in common with many enveloped RNA 
viruses, utilize a cellular trypsin-like endoprotease activity to achieve proteolytic processing. It 
is not yet known whether a second, carboxypeptidase, enzyme plays a role in the maturation of 
the coronavirus S protein, as has been shown for the influenza virus haemagglutinin (Garten & 
Klenk, 1983). 

Following cleavage the fusion properties of the MHV S protein are activated (Sturman et al., 
1985). Examination of the MHV S polypeptide sequence shows that cleavage at the site 
mentioned above would not result in a strongly hydrophobic domain at the amino terminus of 
S 2 . This would contrast with the fusogenic myxovirus proteins, where the hydrophobicity of the 
amino terminus of, for example, the influenza virus HA 2 protein is essential to the fusion 
process. Possibly, the mechanism of coronavirus-induced cell fusion involves either a less 
hydrophobic amino-terminal domain on S 2 or other, as yet unidentified, hydrophobic domains 
on the S protein. 

Finally, it is of interest to compare the nucleic acid and predicted amino acid sequences of the 
MHV S protein gene reported here with the recently determined sequence of the avian 
infectious bronchitis coronavirus (IBV) S protein gene (Binns et al., 1985). At the nucleic acid 
level the sequences appear to be essentially unrelated. However, a comparison of the predicted 
amino acid sequences using the dot matrix program DIAGON (Staden, 19826), which looks not 
only for identical residues, but also for residues with similar properties, reveals a striking degree 
of similarity in the S 2 polypeptide, but little similarity in the Sj region (Fig. 4). To some extent 
the similarities in the S 2 region represent recognizable features, for example, the transmem¬ 
brane domain, the cysteine-rich region, or the putative cleavage site. Additionally however, 
there are regions of amino acids with similar properties, and also specific sequences of identical 
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Fig. 4. DIAGON analysis (Staden, 19826) of the homologies between the MHV-JHM S propolypeptide 
amino acid sequence and the IBV Beaudette S propolypeptide sequences (Binns et al , 1985). Points 
represent matches at the 1 % significance level. The putative MHV cleavage site as well as the known 
cleavage site (Cavanagh et al ., 1986) are indicated (f). 


amino acids (for example, the sequence Trp-Pro-Trp-Tyr-Val-Trp-Leu, positions 1184 to 
1191 for MHV; 1092 to 1099 for IBV), the significance of which remains to be determined. 

The cloning and sequencing of the MHV S protein gene is an important step in studies on the 
molecular biology of MHV and especially the interaction between the virus, the host cell and the 
immune system during infection. Experiments to investigate these interactions at the molecular 
level can now be undertaken. 

We would like to thank Barbara Schelle-Prinz for skilful technical assistance, Helga Kriesinger for typing the 
manuscript and Professor V. ter Meulen for continuous support. We would also like to thank R. Staden for 
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